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1 1 . A method, comprising: 

2 buffering packet header and payload data corresponding to a plurality of 

3 inbound transmission control protocol (TCP) packets received at a destination 

4 machine in an inbound buffer; and 

5 performing TCP input processing of the packet header and payload data that 

6 is buffered in the inbound buffer via a multi-threaded hardware engine, wherein 

7 multiple hardware-arbitrated threads are concurrently executed by the multi- 

8 threaded hardware engine to process the plurality of inbound TCP packets. 

1 2. The method of claim 1 , further comprising arbitrating thread processing via a 

2 hardware-based scheduler. 

1 3. The method of claim 2, wherein arbitrating thread processing comprises 

2 performing at least one of thread suspension, thread scheduling, thread 

3 synchronizing, saving thread state and restoring thread state. 

1 4. The method of claim 1 , wherein the multi-threaded hardware engine 

2 comprises a dedicated TCP offload engine (TOE). 

1 5. The method of claim 1 , further comprising concurrently transferring payload 

2 data buffered in the inbound buffer to host memory while performing TCP input 

3 processing via the multi-threaded hardware engine. 
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1 6. The method of claim 5, further comprising performing a direct memory access 

2 (DMA) transfer to concurrently transfer the payload data to host memory. 

1 7. The method of claim 5, further comprising pre-posting memory locations in 

2 the host memory to which payload data is to be transferred. 

1 8. The method of claim 1 , further comprising: 

2 determining an existence of a TCP connection; 

3 generating TCP connection context data corresponding to the TCP 

4 connection; and 

5 storing the TCP connection context data in host memory. 

1 9. The method of claim 8, further comprising maintaining a cache in which 

2 selected TCP connection context data is cached. 

1 1 0. The method of claim 9, further comprising: 

2 retrieving the TCP connection context data for a given packet from one of 

3 host memory or the cache; 

4 loading the TCP connection context data into a working register; and 

5 processing the TCP connection context data via the multi-threaded hardware 

6 engine to perform TCP input processing. 

1 11. The method of claim 1 0, further comprising: 

2 performing a hash-based lookup against the cache to determine if the TCP 

3 connection context data for the given packet is present in the cache; and 

4 loading the TCP connection context data from the cache into the working 

5 register if the hash-based lookup results in a cache hit, otherwise copying the TCP 
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6 connection context data from host memory into the cache prior to loading the TCP 

7 connection data into the working register. 

1 12. A method comprising: 

2 generating transmission control protocol (TCP) connection context data 

3 corresponding to a TCP connection employed to transmit payload data stored in 

4 host memory from a host machine to a destination machine; and 

5 performing TCP output processing of the payload data stored in memory via a 

6 multi-threaded hardware engine running on the host machine, wherein multiple 

7 hardware-arbitrated threads are concurrently executed by the engine to generate a 

8 plurality of outbound TCP packets containing the payload data, each outbound TCP 

9 packet including a header containing TCP connection data corresponding to the 
10 TCP connection context data. 

1 13. The method of claim 12, further comprising arbitrating thread processing via a 

2 hardware-based scheduler. 

1 14. The method of claim 13, wherein arbitrating thread processing comprises 

2 performing at least one of thread suspension, thread scheduling, thread 

3 synchronizing, saving thread state and restoring thread state. 

1 15. The method of claim 12, wherein the multi-threaded hardware engine 

2 comprises a dedicated TCP offload engine (TOE). 

1 16. The method of claim 12, further comprising concurrently transferring data 

2 comprising outbound TCP packets from host memory to a network interface 
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3 controller (NIC) while performing TCP output processing via the multi-threaded 

4 hardware engine. 

1 17. The method of claim 16, further comprising performing a direct memory 

2 access (DMA) transfer to concurrently transfer the data comprising outbound TCP 

3 packets from the host memory to the NIC. 

1 18. The method of claim 17, further comprising maintaining a DMA transmit 

2 queue containing information defining how DMA transfers are queued. 

1 19. The method of claim 12, further comprising maintaining a cache in which 

2 selected TCP connection context data is cached. 

1 20. The method of claim 1 9, further comprising: 

2 retrieving the TCP connection context data for a given portion of payload data 

3 from one of host memory or the cache; 

4 loading the TCP connection context data into a working register; and 

5 processing the TCP connection context data via the multi-threaded hardware 

6 engine to perform TCP output processing. 



1 21 . The method of claim 20, further comprising: 

2 performing a hash-based lookup against the cache to determine if the TCP 

3 connection context data for the given portion of payload data is present in the cache; 

4 and 

5 loading the TCP connection context data from the cache into the working 

6 register if the hash-based lookup results in a cache hit, otherwise copying the TCP 
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7 connection context data from host memory into the cache prior to loading the TCP 

8 connection data into the working register. 

1 22. An integrated circuit, comprising: 



2 a multi-threaded transmission control protocol (TCP) offload engine (TOE), 

3 including: 

4 a processing engine; 

5 a scheduler, communicatively coupled to the processing engine; 

6 a host memory interface, communicatively coupled to the processing 

7 engine; and 

8 a network interface controller (NIC) interface; communicatively coupled 

9 to the processing engine; and 

10 a direct memory access (DMA) controller, communicatively coupled to the 



1 1 NIC interface and the host memory interface. 

1 23. The integrated circuit of claim 22, further comprising a cache communicatively 

2 coupled to the processing engine and the host memory interface. 

1 24. The integrated circuit of claim 22, further comprising a host interface 

2 communicatively coupled to the processing engine. 

1 25. The integrated circuit of claim 22, wherein the processing engine comprises: 

2 a pipelined arithmetic logic unit (ALU); 

3 a working register, communicatively coupled to the pipelined ALU; 

4 an instruction cache to store instructions executable by the pipelined ALU; 

5 and 
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6 an instruction register, communicatively coupled between the instruction 

7 cache and the pipelined ALU. 

1 26. The integrated circuit of claim 25, wherein the processing engine further 

2 includes a thread cache, communicatively coupled to the working register. 

1 27. The integrated circuit of claim 22, wherein the integrated circuit comprises a 

2 memory controller hub (MCH) in a platform chipset. 

1 28. A system, comprising: 

2 at least one processor, communicatively coupled to a frontside bus; 

3 host memory communicatively coupled to a memory bus; and 

4 a memory controller hub (MCH) communicatively coupled to the at least one 

5 processor via the frontside bus and the host memory via the memory bus, the MCH 

6 embodied as an integrated circuit comprising: 

7 a multi-threaded transmission control protocol (TCP) offload engine 

8 (TOE), including: 

9 a processing engine; 

10 a scheduler, communicatively coupled to the processing engine; 

1 1 a host memory interface, communicatively coupled to the 

12 processing engine and the memory bus; 

13 .a host interface, communicatively coupled to the processing 

14 engine and the frontside bus; 

15 a network interface controller (NIC) interface; communicatively 

16 coupled to the processing engine; and 

17 a direct memory access (DMA) controller, communicatively coupled to 

18 the NIC interface and the host memory interface. 
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1 29. The system of claim 28, further comprising a network interface controller 

2 (NIC), communicatively coupled to the NIC interface via one of a PCI (peripheral 

3 component interconnect) or PCI-X (PCI Express) bus. 

1 30. The system of claim 28, wherein the MCH further includes a cache 

2 communicatively coupled to the processing engine and the host memory interface. 
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